CS 294 - 1 Assignment 2 Report
نویسندگان
چکیده
In this report, we describe our implementation of a linear regression method to classify a numerically-scored sentiment data. The dataset was collected by Mark Dredze and others at Johns Hopkins, which records 1M amazon.com book review. The linear regression classification starts with reading tokenized data and building word counts map, and then training linear classifier by minimizing error term. We conduct a 10-fold cross validation to evaluate our method. We will describe our implementation of each of these steps in addition to enhancements such as stop words removing.
منابع مشابه
CS 294-1: Assignment 1 Naive Bayes Classification with Improvements
The main objective of this assignment was to implement a Naive Bayes classifier and attempt certain improvements upon the vanilla version. A major challenge was to implement the classifier in Scala using the two libraries scalala and scalanlp. This report presents details regarding the different experiments I tried out, namely varying the smoothing parameter, feature selection, n-gram models an...
متن کاملCS 294-1: Assignment 2 A Large-Scale Linear Regression Sentiment Model
The primary objective of this assignment was to build a linear regression sentiment model based on amazon.com reviews. The main challenge comprised of handling moderately large amounts of data on a single machine. The different variations that I tried include the following: exact solution (L2 loss and ridge regularization), stochastic gradient with different training schemes and initialization,...
متن کاملCS 294 - 1 Assignment 1 Report
Text classification has increasing potential applications in many aspects of information world, such as recommender systems and customer service. The goal of this assignment is to apply Naive Bayes classifier to a data set of labeled textual movie reviews and practice Scala/ScalaNLP. The data set “Polarity dataset v2.0” is from http://www.cs.cornell.edu/People/pabo/movie-reviewdata/, created by...
متن کاملU . C . Berkeley Handout N 10 CS 294 : Pseudorandomness and Combinatorial Constructions
Today we will study some conditions under which a very powerful pseudorandom generator can be shown to exist, and also some consequences of the existence of such a pseudorandom generator. We will start by assuming the existence of a permutation p : {0, 1}n → {0, 1}n which is computable in poly(n) time and which, for some constant δ > 0, is (2δn, 2−δn)-one way. (This is an extremely strong assum...
متن کاملCS 294 - 1 A 1 : Naive Bayesian Classifier
Settings. Our codes were written in Scala and compiled under Simple Build Tool (SBT). The programs were run on Mac OS. We test the effectiveness of our implementation in various aspects. If not mentioned explicitly, we adopt the following default settings. We report macroaveraged F1 measures, which were further averaged by ten-fold cross validations. We consider both “Bernoulli” and “Multinomia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012